perm filename INDIA.PRO[ESS,JMC] blob sn#059031 filedate 1973-08-16 generic text, type T, neo UTF8
00100	        A Proposal for Getting the Scientific Literature into

00200	              Computer Form by Keyboarding it in India
00300	
00400	Background Facts
00500	
00600		1.  Eventually, there will be a  library  of  all the world's
00700	literature in a computer file readable  on  consoles  from  anywhere.
00800	This is discussed more fully in
00900	
01000		2.  For  the  first  time  there is a file capable of holding
01100	large numbers of books at a reasonable cost, namely  the trillion bit
01200	Precision Instruments Laser File.  It could store 250,000 books at  a
01300	cost of $4.00 each.
01400	
01500		3.  One of these files will be connected to the ARPA network
01600	of research  organizations in information   processing techniques and
01700	will be available in March 1972 to  the  ARPA  supported projects and
01800	some additional research organizations.
01900	
02000		4.  It has been proposed as an experiment to put a library of
02100	computer science on this file.
02200	
02300	   	5.  There   exist   at  present about  100  display consoles 
02400	suitable for reading literature on this file but more can be added at
02500	about $700.00 a piece.
02600	
02700		6.  The largest expense in creating this library  is  getting
02800	the  information  into  computerized form.  The cost of doing this is
02900	said to be 75 cents to $1.00 per thousand characters  by  keyboarding
03000	in  the U.S.   Some OCR techniques promise 25 cents per thousand with
03100	a reduction to 10 cents for a large enough job.
03200	
03300		7.  Wages in India are about 1/10 that of the U.S.
03400	
03500		8.  The  U.S.  has  about  a billion dollars in Indian rupees 
03600	from PL 480 sales of grain that cannot be converted into dollars.
03700	
03800		On the basis  of  the  above  facts  we  have  the  following
03900	proposal:
04000	
04100		1.  The  U.S. spend about $1,000,000 to $2,000,000 in dollars
04200	and  from  $10,000,000  to $20,000,000 in blocked rupees to put books
04300	and reports, primarily technical, into computerized form.
04400	
04500		2.  The project be carried out  in  Bombay and be directed by
04600	personnel from the Tata Institute of Fundamental Research.
04700	
04800		3.  The  Tata  Institute  do  the R & D  associated  with the  
04900	project.  This includes  developing machine formats for the different 
05000	kinds  of  textual  information,  mathematical formulas, pictures, and
05100	diagrams  and  also display formats.  It also includes techniques for 
05200	making sure the  work is done correctly by proof reading or verifying   
05300	or computer syntax checking.
05400	
05500		4.  Tata gets a PDP-10 computer  with  multi-console  display
05600	equipment to do the research.  The actual keyboarding is done  either
05700	on  keypunches  (purchasable  for  rupees) or on time shared PDP-11's
05800	with keyboards.
05900	
06000		5.  The benefit to the U.S.is that we get the literature into
06100	computerized form cheaply.
06200	
06300		6.  The  benefits  to  India are a) they pay off  some of the 
06400	debt, b) they get a substantial R & D contract in a  leading area  of
06500	computer science, c) they get a first class computer science research
06600	computer.
06700	
06800		Some answers to questions.
06900	
07000	                                                               -3
07100		1.  How much literature  can  be punched? At $3.00 x 10   per  
07200	              7             10
07300	character, $10  gives 3 x 10   chars = 60,000 books.
07400	
07500		2.   What  about  copyright?  Permission of copyright holders
07600	should be requested on the basis that no payment be made for  putting
07700	the  information  into  the  system;  payment is for reference to the
07800	information on the basis of usage at rates to  be  negotiated  later;
07900	probably  a  flat  rate per look at a page for old material and rates
08000	set by the copyright holder after a library system is operational for
08100	new  material. If the copyright holder refuses, his material will not
08200	be entered. If he wants to put it into the  library,  later  he  will
08300	have to do so at his own expense.  Few will refuse.
08500	
08600		3.  Can the project be scaled  down. The cost of the R & D to
08700	develop internal formats and means of supervision places a  limit  on
08800	how much the project could be scaled down and still be meaningful.
08900	
09000		4.  Who  has to agree?  ARPA, most  likely  PSAC,  the  State
09100	Department, the Office of Budget and Management, maybe Congress,  the
09200	Government of India and the Tata Institute.
09300	
09400		5.  What  variations are   possible?   A   different   Indian
09500	organization, the scale of the project, a different country with  low
09600	wages where  the  U.S.  has  blocked currency and where the necessary
09700	computer competence exists,  the  extent  of  participation  of  U.S.
09800	organizations in the R & D and supervision, possible use of a private
09900	Indian concern,  doing  the  whole  job  by  OCR  (optical  character
10000	recognition by machine).